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DETAILED ACTION 
Response to Arguments 

1 . Applicant's arguments with respect to claim 1,13,14, and 26-30 have been 
considered but are moot in view of the new ground(s) of rejection. 

Claim Rejections - 35 USC § 103 

2. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

3. Claims 1-5, 7-10, 13-18, 22, 23 and 26-30 rejected under 35 U.S.C. 103(a) as 
being unpatentable over Rigazio et al. US 6182039 B1 (hereinafter Rigazio) in view of 
Pentheroudakis et al. US 7092871 B2 (hereinafter Pentheroudakis). 

Re claims 1, 13, 14, and 26-30, Rigazio teaches language model generation and 
accumulation apparatus that generates and accumulates language models for speech 
recognition, the apparatus comprising: 

a lower-level N-gram language model (Col. 6 lines 11-20) generation and 
accumulation unit operable to generate and accumulate a lower-level N-gram language 
model that is obtained by modeling a first sequence of words (Col. 4 lines 4-29) having 
a specific linguistic property (Col. 4 lines 30-55 & Fig. 2); 
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a higher-level N-gram language model (Col. 6 lines 11-20) generation and 
accumulation unit operable to generate and accumulate a higher-lever N-gram 
language model that is obtained by modeling the first sequence of words modeled in the 
lower-level N-gram language model (Col. 4 lines 4-29) 

However, Rigazio fails to teach a word string class and a plurality of text as a 
second sequence of words that includes the word string class (Pentheroudakis Col. 6 
line 44 -Col. 7 line 14). 

Pentheroudakis teaches a classification of words or a group of words that can 
represent a title, where a lexicon lookup engine 208 first accesses lexicon 212, which 
may illustratively be a computer readable dictionary, or simply a word list, to determine 
whether the tokens in the proposed segmentation are recognized by, or contained in, 
lexicon 212. In addition, linguistic knowledge component 206 may include 
morphological analyzer 210. For example, if lexicon 212 contains only uninflected word 
forms (i.e., lemmas), then a morphological analysis is desirable to reduce, say the token 
"brothers-in-law" to the dictionary form "brother-in-law." Pentheroudakis teaches a 
morphological analyzer 210 can also do more than simply convert words to uninflected 
forms. For example, morphological analyzer 210 also illustratively includes a number 
morphological component 216 and a punctuation morphological component 218. These 
two components illustratively convert numbers and punctuation characters to values 
which will be recognized by lexicon 212 as well. Additionally, if a sub-token is 
successfully looked up in lexicon 212, and thus validated by linguistic knowledge 
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component 206, that sub-token will not be further broken down. Instead, it is simply 
passed back to tokenizer engine 202 along with an indication that it has been validated. 

Pentheroudakis teaches a linguistic knowledge component 206 also illustratively 
invokes morphological analyzer 210 to assist in recognizing "virtual" words in the 
language (tokens that need to be treated as single words by the system, even though 
they are not listed in the dictionary). For instance, tokens such as numbers, electronic 
mail addresses, drive path names, URLs, emoticons, and the like, can be represented 
as a single word. Morphological analyzer 210 can assist in recognizing each segment 
as an actual word, or as a virtual word, by identifying it as a virtual word or reducing it to 
a normalized form for recognition in lexicon 212. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention modeling text or a plurality of text to include a word string class. 
Modeling a group of words to represent one word allows for the recognition of a unique 
sequence of words that are not considered to be words them selves but can be 
categorized as a word themselves. Creating a system to recognize groups of words 
allows for a reduced amount of error during speech recognition, where punctuation (i.e. 
"-", 7", ".", etc) will be considered in a manner in which any title or name can be 
recognized and classified. 

Re claims 2 and 15, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 1, wherein the higher-level N-gram 
language model (Col. 6 lines 1 1-20) generation and accumulation unit and the lower- 
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level N-gram language model generation and accumulation unit generate the respective 
language models (Col. 4 lines 4-55 & Fig. 2), using different corpuses (Col. 7 line 21 - 
Col. 8 line 19). 

Re claims 3 and 16, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 2, wherein the lower-level N-gram language 
model (Col. 6 lines 11-20) generation and accumulation unit includes a corpus update 
unit operable to update the corpus (Col. 12 lines 23-41) for the lower-level N-gram 
language model (Col. 4 lines 4-55 & Fig. 2), 

the lower-level N-gram language model generation and accumulation unit 
updates the lower-level N-gram language model based on the updated corpus (Col. 12 
lines 23-41), and generates the updated lower-level N-gram language model (Col. 4 
lines 4-55 & Fig. 2). 

Re claims 4 and 17, language model generation and accumulation apparatus 
according to Claim 1, wherein the lower-level N-gram language model (Col. 6 lines 11- 
20) generation and accumulation unit analyzes the first sequence of words (Col. 4 lines 
4-55 & Fig. 2), and generates the lower-level N-gram language model by modeling each 
sequence of the one or more morphemes based on the word string class (Col. 4 lines 4- 
55 & Fig. 2). 
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However, Rigazio fails to teach within the word string class into one or more 
morphemes that are the smallest language units having meanings (Pentheroudakis Col. 
6 line 44 -Col. 7 line 14). 

Pentheroudakis teaches a classification of words or a group of words that can 
represent a title, where a lexicon lookup engine 208 first accesses lexicon 212, which 
may illustratively be a computer readable dictionary, or simply a word list, to determine 
whether the tokens in the proposed segmentation are recognized by, or contained in, 
lexicon 212. In addition, linguistic knowledge component 206 may include 
morphological analyzer 210. For example, if lexicon 212 contains only uninflected word 
forms (i.e., lemmas), then a morphological analysis is desirable to reduce, say the token 
"brothers-in-law" to the dictionary form "brother-in-law." Pentheroudakis teaches a 
morphological analyzer 210 can also do more than simply convert words to uninflected 
forms. For example, morphological analyzer 210 also illustratively includes a number 
morphological component 216 and a punctuation morphological component 218. These 
two components illustratively convert numbers and punctuation characters to values 
which will be recognized by lexicon 212 as well. Additionally, if a sub-token is 
successfully looked up in lexicon 212, and thus validated by linguistic knowledge 
component 206, that sub-token will not be further broken down. Instead, it is simply 
passed back to tokenizer engine 202 along with an indication that it has been validated. 

Pentheroudakis teaches a linguistic knowledge component 206 also illustratively 
invokes morphological analyzer 210 to assist in recognizing "virtual" words in the 
language (tokens that need to be treated as single words by the system, even though 
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they are not listed in the dictionary). For instance, tokens such as numbers, electronic 
mail addresses, drive path names, URLs, emoticons, and the like, can be represented 
as a single word. Morphological analyzer 210 can assist in recognizing each segment 
as an actual word, or as a virtual word, by identifying it as a virtual word or reducing it to 
a normalized form for recognition in lexicon 212. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention a word string class that is analyzed into morphemes having the 
smallest language unit meaning. Morphologically analyzing text or a plurality of text 
allows for a diverse recognition of data, where punctuation can be taken into account 
that links several letter/words to form a group of words. Modeling a group of words to 
represent one word allows for the recognition of a unique sequence of words that are 
not considered to be words them selves but can be categorized as a word themselves. 
Creating a system to recognize groups of words allows for a reduced amount of error 
during speech recognition, where punctuation (i.e. T, etc) will be considered in a 
manner in which any title or name can be recognized and classified. 

Re claims 5 and 18, language model generation and accumulation apparatus 
according to Claim 1, wherein the higher-level N-gram language model (Col. 6 lines 11- 
20) generation and accumulation unit, and then generates the higher-level N-gram 
language model by modeling (Col. 4 lines 30-55 $ Fig. 2) 

a sequence made up of the virtual word and the other words (Pentheroudakis 
Col. 6 line 44 -Col. 7 line 14), 
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substitutes the word string class with a virtual word (Pentheroudakis Col. 6 line 
44 -Col. 7 line 14). 

the word string class being included in each of the plurality of texts analyzed into 
morphemes (Pentheroudakis Col. 6 line 44 - Col. 7 line 14). 

Pentheroudakis teaches a classification of words or a group of words that can 
represent a title, where a lexicon lookup engine 208 first accesses lexicon 212, which 
may illustratively be a computer readable dictionary, or simply a word list, to determine 
whether the tokens in the proposed segmentation are recognized by, or contained in, 
lexicon 212. In addition, linguistic knowledge component 206 may include 
morphological analyzer 210. For example, if lexicon 212 contains only uninfected word 
forms (i.e., lemmas), then a morphological analysis is desirable to reduce, say the token 
"brothers-in-law" to the dictionary form "brother-in-law." Pentheroudakis teaches a 
morphological analyzer 210 can also do more than simply convert words to uninfected 
forms. For example, morphological analyzer 210 also illustratively includes a number 
morphological component 216 and a punctuation morphological component 218. These 
two components illustratively convert numbers and punctuation characters to values 
which will be recognized by lexicon 212 as well. Additionally, if a sub-token is 
successfully looked up in lexicon 212, and thus validated by linguistic knowledge 
component 206, that sub-token will not be further broken down. Instead, it is simply 
passed back to tokenizer engine 202 along with an indication that it has been validated. 

Pentheroudakis teaches a linguistic knowledge component 206 also illustratively 
invokes morphological analyzer 210 to assist in recognizing "virtual" words in the 
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language (tokens that need to be treated as single words by the system, even though 
they are not listed in the dictionary). For instance, tokens such as numbers, electronic 
mail addresses, drive path names, URLs, emoticons, and the like, can be represented 
as a single word. Morphological analyzer 210 can assist in recognizing each segment 
as an actual word, or as a virtual word, by identifying it as a virtual word or reducing it to 
a normalized form for recognition in lexicon 212. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention a word string class that is analyzed into morphemes having the 
smallest language unit meaning, where a virtual word is substituted for a sequence of 
words. Morphologically analyzing text or a plurality of text allows for a diverse 
recognition of data, where punctuation can be taken into account that links several 
letter/words to form a group of words. Modeling a group of words to represent one word 
or a virtual word allows for the recognition of a unique sequence of words that are not 
considered to be words them selves but can be categorized as a word themselves. 
Creating a system to recognize groups of words allows for a reduced amount of error 
during speech recognition, where punctuation (i.e. T, etc) will be considered in a 
manner in which any title or name can be recognized and classified. 

Re claims 7, 9, and 22, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 1, further comprising 

a syntactic tree generation unit operable to perform morphemic analysis as well 
as syntactic analysis of a text (Col. 5 lines 42-63), and generate a syntactic tree in 
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which said-the text is structured by a plurality of layers, focusing on a node that is on 
said the syntactic tree (Col. 5 lines 42-63) and that has been selected on the basis of a 
predetermined criterion (Col. 4 lines 4-55 & Fig. 2), 

wherein the higher-level N-gram language model (Col. 6 lines 11-20) generation 
and accumulation unit generates the higher-level N-gram language model for syntactic 
tree, using a first subtree (Col. 5 lines 42-63 & Fig. 4) that constitutes an upper layer 
from the focused node (Col. 4 lines 4-55 & Fig. 2), and 

the lower-level N-gram language model (Col. 6 lines 11-20) generation and 
accumulation unit generates the lower-level N-gram language model for syntactic tree, 
using a second subtree (Col. 5 lines 42-63 & Fig. 4) that constitutes a lower layer from 
the focused node (Col. 4 lines 4-55 & Fig. 2) 

However, Rigazio fails to teach a morphemic analysis (Pentheroudakis Col. 6 line 
44 -Col. 7 line 14). 

Pentheroudakis teaches a classification of words or a group of words that can 
represent a title, where a lexicon lookup engine 208 first accesses lexicon 212, which 
may illustratively be a computer readable dictionary, or simply a word list, to determine 
whether the tokens in the proposed segmentation are recognized by, or contained in, 
lexicon 212. In addition, linguistic knowledge component 206 may include 
morphological analyzer 210. For example, if lexicon 212 contains only uninflected word 
forms (i.e., lemmas), then a morphological analysis is desirable to reduce, say the token 
"brothers-in-law" to the dictionary form "brother-in-law." Pentheroudakis teaches a 
morphological analyzer 210 can also do more than simply convert words to uninflected 
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forms. For example, morphological analyzer 210 also illustratively includes a number 
morphological component 216 and a punctuation morphological component 218. These 
two components illustratively convert numbers and punctuation characters to values 
which will be recognized by lexicon 212 as well. Additionally, if a sub-token is 
successfully looked up in lexicon 212, and thus validated by linguistic knowledge 
component 206, that sub-token will not be further broken down. Instead, it is simply 
passed back to tokenizer engine 202 along with an indication that it has been validated. 

Pentheroudakis teaches a linguistic knowledge component 206 also illustratively 
invokes morphological analyzer 210 to assist in recognizing "virtual" words in the 
language (tokens that need to be treated as single words by the system, even though 
they are not listed in the dictionary). For instance, tokens such as numbers, electronic 
mail addresses, drive path names, URLs, emoticons, and the like, can be represented 
as a single word. Morphological analyzer 210 can assist in recognizing each segment 
as an actual word, or as a virtual word, by identifying it as a virtual word or reducing it to 
a normalized form for recognition in lexicon 212. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention a word string class that is analyzed into morphemes having the 
smallest language unit meaning. Morphologically analyzing text or a plurality of text 
allows for a diverse recognition of data, where punctuation can be taken into account 
that links several letter/words to form a group of words. Modeling a group of words to 
represent one word allows for the recognition of a unique sequence of words that are 
not considered to be words them selves but can be categorized as a word themselves. 
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Creating a system to recognize groups of words allows for a reduced amount of error 
during speech recognition, where punctuation (i.e. 7", etc) will be considered in a 
manner in which any title or name can be recognized and classified. 

Re claims 8, 10, and 23, Rigazio teaches the language model (Col. 6 lines 1 1-20) 
generation and accumulation apparatus according to Claim 7, 

wherein the lower-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit includes 

a language model generation exception word judgment unit operable to judge a 
specific word appearing in the second subtree (Col. 5 lines 42-63)as an exception word 
based on a predetermined linguistic property (Col. 4 lines 30-55 $ Fig. 2), the exception 
word being a word not being included as a constituent word of any subtree (Col. 4 lines 
30-55 $ Fig. 2), 

the lower-level N-gram language model generation and accumulation unit 
generates the lower-level N-gram language model (Col. 4 lines 30-55 $ Fig. 2)by 
dividing the exception word into (i) a syllable that is a basic phonetic unit constituting a 
pronunciation of the word (Col. 4 lines 30-55 $ Fig. 2) and (ii) a unit that is obtained by 
combining syllables, and then by modeling a sequence made up of the syllable and the 
unit obtained by combining syllables in dependency on a location of the exception word 
in the syntactic tree (Col. 5 lines 42-63) and on the linguistic property of the exception 
word (Col. 4 lines 30-55 $ Fig. 2) 
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4. Claims 6, 11, 12, 19-21 rejected under 35 U.S.C. 103(a) as being 
unpatentable over Rigazio et al. US 6182039 B1 (hereinafter Rigazio) in view of 
Pentheroudakis et al. US 7092871 B2 (hereinafter Pentheroudakis) further in view 
of Bakis et al. US 6023673 A (hereinafter Bakis). 

Re claims 6 and 19, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 1 , 

wherein the lower-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit includes an exception word judgment unit operable to judge 
whether or not a specific word out of a plurality of words that appear in the word string 
class should be treated as an exception word (Col. 4 lines 4-55 & Fig. 2), based on a 
linguistic property of the specific word, and divides the exception word into (i) a syllable 
that is a basic phonetic unit constituting a pronunciation of the exception word (Col. 4 
lines 4-55 & Fig. 2) and (ii) a unit that is obtained by combining syllables based on a 
judgment result the exception word being (Col. 4 lines 4-55 & Fig. 2), 

the language model generation and accumulation apparatus further comprises a 
class dependent syllable N-gram generation and accumulation unit operable to 
generate class dependent syllable N-grams by modeling a sequence made up of the 
syllable and the unit obtained by combining syllables and by providing a language 
likelihood (Col. 1 lines 31-39) to the sequence in dependency on either the word string 
class or the linguistic property of the exception word (Col. 4 lines 4-55 & Fig. 2), 

However, Rigazio fails to teach a word not being included as a constituent word 
of the word string class (Pentheroudakis Col. 6 line 44 - Col. 7 line 14). 
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accumulate the generated class dependent syllable N-grams (Pentheroudakis 
Col. 6 line 44 -Col. 7 line 14) 

Pentheroudakis teaches a classification of words or a group of words that can 
represent a title, where a lexicon lookup engine 208 first accesses lexicon 212, which 
may illustratively be a computer readable dictionary, or simply a word list, to determine 
whether the tokens in the proposed segmentation are recognized by, or contained in, 
lexicon 212. In addition, linguistic knowledge component 206 may include 
morphological analyzer 210. For example, if lexicon 212 contains only uninflected word 
forms (i.e., lemmas), then a morphological analysis is desirable to reduce, say the token 
"brothers-in-law" to the dictionary form "brother-in-law." Pentheroudakis teaches a 
morphological analyzer 210 can also do more than simply convert words to uninflected 
forms. For example, morphological analyzer 210 also illustratively includes a number 
morphological component 216 and a punctuation morphological component 218. These 
two components illustratively convert numbers and punctuation characters to values 
which will be recognized by lexicon 212 as well. Additionally, if a sub-token is 
successfully looked up in lexicon 212, and thus validated by linguistic knowledge 
component 206, that sub-token will not be further broken down. Instead, it is simply 
passed back to tokenizer engine 202 along with an indication that it has been validated. 

Pentheroudakis teaches a linguistic knowledge component 206 also illustratively 
invokes morphological analyzer 210 to assist in recognizing "virtual" words in the 
language (tokens that need to be treated as single words by the system, even though 
they are not listed in the dictionary). For instance, tokens such as numbers, electronic 
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mail addresses, drive path names, URLs, emoticons, and the like, can be represented 
as a single word. Morphological analyzer 210 can assist in recognizing each segment 
as an actual word, or as a virtual word, by identifying it as a virtual word or reducing it to 
a normalized form for recognition in lexicon 212. 

However, Rigazio in view of Pentheroudakis fails to teach the language likelihood 
being a logarithm value of a probability (Bakis Col. 4 line 63 - Col. 5 line 26). 

Bakis teaches in order to find the best L prototypes in the target level M, 
likelihoods are successively calculated starting from the top level k=1 . In the top level, 
log-likelihoods for all N.sub.1 prototypes in that level are calculated, and the results 
sorted. The log-likelihood is defined as the probability that the parameter values of a 
prototype vector signal match the feature values of a feature vector signal under 
consideration. Starting with the best prototype from the sorted list, i.e., the one with the 
highest log-likelihood. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention modeling text or a plurality of text to include a word string class, 
where a likelihood based on a logarithmic probability is calculated. Modeling a group of 
words to represent one word allows for the recognition of a unique sequence of words 
that are not considered to be words them selves but can be categorized as a word 
themselves. Creating a system to recognize groups of words allows for a reduced 
amount of error during speech recognition, where punctuation (i.e. T, etc) will be 
considered in a manner in which any title or name can be recognized and classified. 
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Additionally, using a logarithmic probability allows for the coverage of a large range of 
data which can be ranked when candidate matches are found, where a system that 
learns or is trainable can expand its models/dictionaries to a broad range through the 
use of a log scale. 

Re claims 1 1 and 12, Rigazio teaches the language model generation and 
accumulation apparatus according to Claim 1, 

wherein the higher-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit generates the higher-level N-gram language model in which each 
(Col. 4 lines 30-55 $ Fig. 2) 

However, Rigazio fails to teach a sequence of N words including the word string 
class is associated (Pentheroudakis Col. 6 line 44 - Col. 7 line 14) 

Pentheroudakis teaches a classification of words or a group of words that can 
represent a title, where a lexicon lookup engine 208 first accesses lexicon 212, which 
may illustratively be a computer readable dictionary, or simply a word list, to determine 
whether the tokens in the proposed segmentation are recognized by, or contained in, 
lexicon 212. In addition, linguistic knowledge component 206 may include 
morphological analyzer 210. For example, if lexicon 212 contains only uninflected word 
forms (i.e., lemmas), then a morphological analysis is desirable to reduce, say the token 
"brothers-in-law" to the dictionary form "brother-in-law." Pentheroudakis teaches a 
morphological analyzer 210 can also do more than simply convert words to uninflected 
forms. For example, morphological analyzer 210 also illustratively includes a number 
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morphological component 216 and a punctuation morphological component 218. These 
two components illustratively convert numbers and punctuation characters to values 
which will be recognized by lexicon 212 as well. Additionally, if a sub-token is 
successfully looked up in lexicon 212, and thus validated by linguistic knowledge 
component 206, that sub-token will not be further broken down. Instead, it is simply 
passed back to tokenizer engine 202 along with an indication that it has been validated. 

Pentheroudakis teaches a linguistic knowledge component 206 also illustratively 
invokes morphological analyzer 210 to assist in recognizing "virtuar words in the 
language (tokens that need to be treated as single words by the system, even though 
they are not listed in the dictionary). For instance, tokens such as numbers, electronic 
mail addresses, drive path names, URLs, emoticons, and the like, can be represented 
as a single word. Morphological analyzer 210 can assist in recognizing each segment 
as an actual word, or as a virtual word, by identifying it as a virtual word or reducing it to 
a normalized form for recognition in lexicon 212. 

However, Rigazio in view of Pentheroudakis fails to teach a probability at which 
said each sequence of N words (Bakis Col. 4 line 63 - Col. 5 line 26). 

Bakis teaches in order to find the best L prototypes in the target level M, 
likelihoods are successively calculated starting from the top level k=1. In the top level, 
log-likelihoods for all N.sub.1 prototypes in that level are calculated, and the results 
sorted. The log-likelihood is defined as the probability that the parameter values of a 
prototype vector signal match the feature values of a feature vector signal under 
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consideration. Starting with the best prototype from the sorted list, i.e., the one with the 
highest log-likelihood. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention modeling text or a plurality of text to include a word string class, 
where a likelihood based on a probability is calculated. Modeling a group of words to 
represent one word allows for the recognition of a unique sequence of words that are 
not considered to be words them selves but can be categorized as a word themselves. 
Creating a system to recognize groups of words allows for a reduced amount of error 
during speech recognition, where punctuation (i.e. T, etc) will be considered in a 
manner in which any title or name can be recognized and classified. Additionally, using 
a logarithmic probability allows for the coverage of a large range of data which can be 
ranked when candidate matches are found, where a system that learns or is trainable 
can expand its models/dictionaries to a broad range through the use of a log scale. 

Re claim 20, Rigazio teaches the language model generation and accumulation 
apparatus according to Claim 19, further comprising 

a syntactic tree generation unit operable to perform morphemic analysis as well 
as syntactic analysis of a text (Col. 5 lines 42-63), and generate a syntactic tree in 
which said-the text is structured by a plurality of layers, focusing on a node that is on 
said the syntactic tree (Col. 5 lines 42-63) and that has been selected on the basis of a 
predetermined criterion (Col. 4 lines 4-55 & Fig. 2), 
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wherein the higher-level N-gram language model (Col. 6 lines 11-20) generation 
and accumulation unit generates the higher-level N-gram language model for syntactic 
tree, using a first subtree (Col. 5 lines 42-63 & Fig. 4) that constitutes an upper layer 
from the focused node (Col. 4 lines 4-55 & Fig. 2), and 

the lower-level N-gram language model (Col. 6 lines 1 1-20) generation and 
accumulation unit generates the lower-level N-gram language model for syntactic tree, 
using a second subtree (Col. 5 lines 42-63 & Fig. 4) that constitutes a lower layer from 
the focused node (Col. 4 lines 4-55 & Fig. 2) 

the speech recognition apparatus comprises: 

an acoustic processing unit operable to generate feature parameters from the 
speech (Col. 4 lines 30-55 $ Fig. 2); 

a word comparison unit operable to compare a pronunciation of each word with 
each of the feature parameters (Col. 4 lines 30-55 $ Fig. 2), and generate a set of word 
hypotheses including an utterance segment of each word and an acoustic likelihood of 
each word (Col. 1 lines 31-39); 

a word string hypothesis (Col. 12 lines 23-41) generation unit operable to 
generate a word string hypothesis from the set of word hypotheses with reference to the 
higher-level N-gram language model for syntactic tree (Col. 5 lines 42-63) and the 
lower-level N-gram language model for syntactic tree (Col. 5 lines 42-63), and generate 
a result of the speech recognition 

However, Rigazio fails to teach a morphemic analysis (Pentheroudakis Col. 6 line 
44 - Col. 7 line 14). 
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Pentheroudakis teaches a classification of words or a group of words that can 
represent a title, where a lexicon lookup engine 208 first accesses lexicon 212, which 
may illustratively be a computer readable dictionary, or simply a word list, to determine 
whether the tokens in the proposed segmentation are recognized by, or contained in, 
lexicon 212. In addition, linguistic knowledge component 206 may include 
morphological analyzer 210. For example, if lexicon 212 contains only uninflected word 
forms (i.e., lemmas), then a morphological analysis is desirable to reduce, say the token 
"brothers-in-law" to the dictionary form "brother-in-law." Pentheroudakis teaches a 
morphological analyzer 210 can also do more than simply convert words to uninflected 
forms. For example, morphological analyzer 210 also illustratively includes a number 
morphological component 216 and a punctuation morphological component 218. These 
two components illustratively convert numbers and punctuation characters to values 
which will be recognized by lexicon 212 as well. Additionally, if a sub-token is 
successfully looked up in lexicon 212, and thus validated by linguistic knowledge 
component 206, that sub-token will not be further broken down. Instead, it is simply 
passed back to tokenizer engine 202 along with an indication that it has been validated. 

Pentheroudakis teaches a linguistic knowledge component 206 also illustratively 
invokes morphological analyzer 210 to assist in recognizing "virtual" words in the 
language (tokens that need to be treated as single words by the system, even though 
they are not listed in the dictionary). For instance, tokens such as numbers, electronic 
mail addresses, drive path names, URLs, emoticons, and the like, can be represented 
as a single word. Morphological analyzer 210 can assist in recognizing each segment 
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as an actual word, or as a virtual word, by identifying it as a virtual word or reducing it to 
a normalized form for recognition in lexicon 212. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention a word string class that is analyzed into morphemes having the 
smallest language unit meaning. Morphologically analyzing text or a plurality of text 
allows for a diverse recognition of data, where punctuation can be taken into account 
that links several letter/words to form a group of words. Modeling a group of words to 
represent one word allows for the recognition of a unique sequence of words that are 
not considered to be words them selves but can be categorized as a word themselves. 
Creating a system to recognize groups of words allows for a reduced amount of error 
during speech recognition, where punctuation (i.e. T, etc) will be considered in a 
manner in which any title or name can be recognized and classified. 

Re claim 21 , Rigazio teaches the apparatus according to Claim 20, 
wherein the lower-level N-gram language model (Col. 6 lines 11-20) generation 
and accumulation unit includes 

a language model generation exception word judgment unit operable to judge a 
specific word appearing in the second subtree (Col. 5 lines 42-63)as an exception word 
based on a predetermined linguistic property (Col. 4 lines 30-55 $ Fig. 2), the exception 
word being a word not being included as a constituent word of any subtree (Col. 4 lines 
30-55 $ Fig. 2), 
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the lower-level N-gram language model generation and accumulation unit 
generates the lower-level N-gram language model (Col. 4 lines 30-55 $ Fig. 2)by 
dividing the exception word into (i) a syllable that is a basic phonetic unit constituting a 
pronunciation of the word (Col. 4 lines 30-55 $ Fig. 2) and (ii) a unit that is obtained by 
combining syllables, and then by modeling a sequence made up of the syllable and the 
unit obtained by combining syllables in dependency on a location of the exception word 
in the syntactic tree (Col. 5 lines 42-63) and on the linguistic property of the exception 
word (Col. 4 lines 30-55 $ Fig. 2) 

the word string hypothesis generation unit generates the result of the speech 
recognition (Col. 12 lines 23-41). 

Re claims 24 and 25, Rigazio teaches the speech recognition apparatus 
according to Claim 14, 

wherein the higher-level N-gram language model (Col. 6 lines 1 1-20) generation 
and accumulation unit generates the higher-level N-gram language model in which each 
sequence of N words (Col. 4 lines 30-55 $ Fig. 2) 

the speech recognition apparatus comprises 

a word string hypothesis generation unit operable to evaluate a word string 
hypothesis (Col. 12 lines 23-41). 

However, Rigazio in view of Pentheroudakis fails to teach including the word 
string class is associated with a probability at which the each sequence of words occurs 
(Bakis Col. 4 line 63 - Col. 5 line 26), 
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multiplying each probability at which the each sequence of N words including the 
word string class occurs (Bakis Col. 4 line 63 - Col. 5 line 26). 

Bakis teaches in order to find the best L prototypes in the target level M, 
likelihoods are successively calculated starting from the top level k=1 . In the top level, 
log-likelihoods for all N.sub.1 prototypes in that level are calculated, and the results 
sorted. The log-likelihood is defined as the probability that the parameter values of a 
prototype vector signal match the feature values of a feature vector signal under 
consideration. Starting with the best prototype from the sorted list, i.e., the one with the 
highest log^likelihood. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention modeling text or a plurality of text to include a word string class, 
where a likelihood based on a probability is calculated. Modeling a group of words to 
represent one word allows for the recognition of a unique sequence of words that are 
not considered to be words them selves but can be categorized as a word themselves. 
Creating a system to recognize groups of words allows for a reduced amount of error 
during speech recognition, where punctuation (i.e. ,, - , \ 7", etc) will be considered in a 
manner in which any title or name can be recognized and classified. Additionally, the 
multiplication by a probability allows for a score to be generated based on a hierarchical 
data set for the purpose of ranking, such as after a word/string is classified, ranking the 
classification further to prune any lower ranked candidates by the scaled or multiplied 
probability. 
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Conclusion 

5. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. US 6243669 B1, US 4797930 A, US 6654721 B2, US 5477451 
A, US 5510981 A, US 5870706 A, US 20020032564 A1, US 20020042707 A1, US 
6336108 B1, US 6839669 B1. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael C. Colucci whose telephone number is (571)- 
270-1847. The examiner can normally be reached on 9:30 am - 6:00 pm, Monday- 
Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)-272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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